Bridging the Inflection Morphology Gap for Arabic Statistical Machine Translation
نویسندگان
چکیده
Statistical machine translation (SMT) is based on the ability to effectively learn word and phrase relationships from parallel corpora, a process which is considerably more difficult when the extent of morphological expression differs significantly across the source and target languages. We present techniques that select appropriate word segmentations in the morphologically rich source language based on contextual relationships in the target language. Our results take advantage of existing word level morphological analysis components to improve translation quality above state-of-the-art on a limited-data Arabic to English speech translation task.
منابع مشابه
Morpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation
The Arabic language has far richer systems of inflection and derivation than English which has very little morphology. This morphology difference causes a large gap between the vocabulary sizes in any given parallel training corpus. Segmentation of inflected Arabic words is a way to smooth its highly morphological nature. In this paper, we describe some statistically and linguistically motivate...
متن کاملApplying Morphology Generation Models to Machine Translation
We improve the quality of statistical machine translation (SMT) by applying models that predict word forms from their stems using extensive morphological and syntactic information from both the source and target languages. Our inflection generation models are trained independently of the SMT system. We investigate different ways of combining the inflection prediction component with the SMT syst...
متن کاملTranslate, Predict or Generate: Modeling Rich Morphology in Statistical Machine Translation
We compare three methods of modeling morphological features in statistical machine translation (SMT) from English to Arabic, a morphologically rich language. Features can be modeled as part of the core translation process mapping source tokens to target tokens. Alternatively these features can be generated using target monolingual context as part of a separate generation (or post-translation in...
متن کاملDiscriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
In this paper we present a novel discriminative mixture model for statistical machine translation (SMT). We model the feature space with a log-linear combination of multiple mixture components. Each component contains a large set of features trained in a maximumentropy framework. All features within the same mixture component are tied and share the same mixture weights, where the mixture weight...
متن کاملUsing Linguistic Knowledge in Statistical Machine Translation
In this thesis, we present methods for using linguistically motivated information to enhance the performance of statistical machine translation (SMT). One of the advantages of the statistical approach to machine translation is that it is largely languageagnostic. Machine learning models are used to automatically learn translation patterns from data. SMT can, however, be improved by using lingui...
متن کامل